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Abstract 

The measurement of the similarity of RNA secondary structures, and in general of 
contact structures, of a fixed length has several specific applications. For instance, it 
is used in the analysis of the ensemble of suboptimal secondary structures generated by 
a given algorithm on a given RNA sequence, and in the comparison of the secondary 
structures predicted by different algorithms on a given RNA molecule. It is also a useful 
tool in the quantitative study of sequence-structure maps. A way to measure this similarity 
is by means of metrics. In this paper we introduce a new class of metrics d m , m > 3, on 
the set of all contact structures of a fixed length, based on their representation by means 
of edge ideals in a polynomial ring. These metrics can be expressed in terms of Hilbert 
functions of monomial ideals, which allows the use of several public domain computer 
algebra systems to compute them. We study some abstract properties of these metrics, 
and we obtain explicit descriptions of them for m = 3, 4 on arbitrary contact structures 
and for m = 5, 6 on RNA secondary structures. 

Keywords: contact structure, RNA secondary structure, metric, distance, monomial 
ideal, Hilbert function. 



1 Introduction 

As it is well known, in the cell and in vitro RNA molecules and proteins fold into three- 
dimensional structures, which determine their biochemical function. A central problem in 
molecular biology is the study of these structures, their prediction and comparison. As different 
levels of precision are suitable for different problems, we can sometimes forget about the detailed 
description of the three-dimensional structure of a biopolymer and simply focus our attention 
on what has been called its contact structure: the set of all pairs of monomers (nucleotides in 
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RNA molecules, aminoacids in proteins) that are spatial neighbors in the three-dimensional 
structure [4]. If we assume the monomers numbered from 1 to n along the backbone of the 
polymer, then a contact structure can be understood as an undirected graph without multiple 
edges or self- loops with set of nodes {1, . . . ,n}: its edges are consistently called contacts and 
its number n of nodes its length. 

The secondary structures of RNA molecules form a special class of contact structures. In 
them, contacts represent the hydrogen bonds between pairs of bases that held together the 
three-dimensional structure. A hydrogen bond can only form between bases that are several 
positions apart in the chain, but we shall not take this restriction into account here and we shall 
impose that a contact can only exist between non-consecutive bases. A restriction is added 
to the definition of RNA secondary structure: a base can only pair with at most one base. 
This restriction is called the unique bonds condition and it is specific of secondary structures. 
It is usual to impose a further restriction on RNA secondary structures, by forbidding the 
existence of (pseudo) knots: a contact between bases at the ith and jth positions in the backbone 
cannot coexist with a contact between bases at the fcth and Ith positions if i < k < j < I. 
This restriction has its origin in the first dynamic programming methods to predict RNA 
secondary structures [25, 27, 28], but since real RNA structures can contain knots, which are 
moreover important structural elements in many RNA molecules, and their existence does not 
compromise our models, we shall not impose this restriction here. 

Contact structures with unique bonds can also be used to represent the basic building blocks 
of protein structures, like a-helixes, /3-sheets and (3 and f2-turns (called thus protein secondary 
structures), which are also held together by means of hydrogen bonds between non-consecutive 
aminoacids. 

But, beyond secondary structures, the representation of the neighborhood in three-dimensio- 
nal structures of RNA molecules and proteins needs contact structures without unique bonds. 
The full three-dimensional structure of RNA molecules contains contacts that violate the unique 
bonds condition, like base triplets and guanine platforms [1, 16]. And in the tertiary structure 
of a protein, represented for instance by means of a self-avoiding walk in a lattice (i.e., a path 
in N 3 that does not visit the same node more than once [14]), one aminoacid can be spatial 
neighbor of several aminoacids [5, 7]. But even in this general case, the existence of contacts 
between pairs of monomers that are next to each other in the backbone is still forbidden in 
contact structures, because their spatial closeness can be understood as a consequence of their 
position in the backbone. 

As we mentioned, an important problem in molecular biology is the comparison of the three- 
dimensional structures formed by RNA molecules and proteins, because it is assumed that 
a preserved three-dimensional structure corresponds to a preserved function. Moreover, the 
measurement of the similarity of contact structures on biopolymers of a fixed length has an 
interest in itself. For instance, it can be used in the analysis of the ensemble of suboptimal 
solutions provided by a given algorithm, like for instance Zuker's algorithm [26], to the problem 
of determining the secondary structure of a given RNA molecule; see [27, 17] . It can also be used 
to compare the output of different prediction algorithms applied to the same RNA molecule 
or protein, to assess their performance. This similarity measurement lies also at the basis of 
the study of the mapping that assigns to each RNA molecule or protein the structure it folds 
into [9, 21] and it can be used in the study of phenotype spaces [10]. 

The similarity of contact structures can be quantified by means of metrics on the set of all 
contact structures of a given length. For instance, with the purpose of comparing suboptimal 
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solutions to the RNA secondary structure prediction problem in order to reduce the number of 
alternate structures obtained by his algorithm, Zuker introduced from the very beginning its 
metric dz [26, 27] and more recently the mountain metrics [17]. Tree editing distances have 
also been used in this context [13, 17, 20]. 

Reidys and Stadler defined in their seminal paper [18] on algebraic models of biopolymer 
structures three metrics on RNA secondary structures of fixed length n, based on their repre- 
sentations as involutions and as permutation subgroups, and on Magarshak's matrix represen- 
tation [15], and they discussed their biophysical relevance. These metrics have been recently 
analyzed from the mathematical point of view [8, 19]. 

Since their models cannot be used to represent in a one-to-one way contact structures without 
unique bonds, Reidys and Stadler's metrics cannot be extended to the set of arbitrary contact 
structures of a fixed length. In this paper we overcome this drawback, by switching from 
subgroups of the symmetric group S n to monomial ideals of a polynomial ring in n variables. 
More specifically, we represent a contact structure by means of its edge ideal. Edge ideals are 
a quite popular tool in commutative algebra to represent graphs and to study their properties 
[23, 24]. By using them, we generalize Reidys and Stadler's subgroup metric to define a metric 
through their permutation subgroups model, to define a family of metrics (cZ m ) m >3 on the set 
of all contact structures of a fixed length. Up to our knowledge, these are the first metrics 
defined on arbitrary contact structures of a fixed length that are independent of any notion of 
graph edition. We express these metrics in terms of Hilbert functions, which makes them easily 
computable using several public domain computer algebra systems like for instance, CoCoA [3] 
or Macaulay [11]. We also obtain explicit expressions for several of these metrics on contact 
and RNA secondary structures, which allow to grasp the notion of similarity they measure. 

We hope that our metrics will increase the range of sensible metrics available in the applica- 
tions of the comparison of structures of a fixed length mentioned above: as Moulton, Zuker et 
al point out, "[...] generally speaking, it is probably safest to try as many metrics as possible" 
[17, p. 290]. 

2 Preliminaries 

In this section we recall some definitions and facts on contact and RNA secondary structures, 
and we take the opportunity to fix nomc notations and conventions that we shall use henceforth, 
usually without any further notice. 

Contact structures and RNA secondary structures. From now on, let [n] denote the 
set {1, . . . ,n}, for every positive integer n. We begin by recalling the definition of contact 
structure from [18, 22]; contact structures are also called diagrams in [12]. 

Definition 1 A contact structure of length n is an undirected graph without multiple edges 
or self-loops T = ([n],Q), for some n ^ 1, whose arcs {i,j} G Q, called contacts, satisfy the 
following condition: 

i) For every i G [n], {i, i + 1} ^ Q. 

A contact structure has unique bonds when it satisfies the following extra condition: 

ii) For every i G [n], if {i, j}, {i, k} G Q, then j = k. 

Condition (i) translates the impossibility of a contact between two consecutive monomers, 
while condition (ii) translates the unique bonds condition in RNA secondary structures men- 
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tioned in the introduction. We shall call the contact structures with unique bonds RNA sec- 
ondary structures. As we mentioned in the Introduction, the conventional definition of RNA 
secondary structure forbids moreover the existence of pseudoknots (pairs of contacts {i,j} and 
{fc, 1} such that i < k < j < I), but we shall not impose this restriction here. 

We shall denote from now on a contact {j, k} by j-k or k-j, without distinction. A node is 
said to be isolated in a contact structure when it is not involved in any contact. 

We shall often represent specific RNA secondary structures without pseudoknots by means 
of their bracket representation [13], obtained by replacing in the sequence [n] each contact i-j 
with i < j by a "(" in the ith position and a ")" in the jth position, and each isolated node 
by a dot in the corresponding position. For instance, 

((((((...)))))..((...)).) 

represents the secondary structure 

([25], {1-25, 2-14, 3-13, 4-12, 5-11, 6-10, 17-23, 18-22}). 

Knotted RNA secondary structures admit a similar representation, using different types of 
brackets to represent contacts in order to avoid ambiguities. 

Given two contact structures of the same length I\ = ([n],Qi),r 2 = ([n],Q2), their union 
is the contact structure 

TiUr 2 - {[n],Q 1 UQ 2 ). 

From now on, and unless otherwise stated, given any contact structure r or Tj, i = 1, 2, . . ., 
we shall always denote its set of contacts by Q or Qi, respectively. 

Let C n and S n denote the sets of all contact structures and of all RNA secondary structures 
of length n, respectively. 

Subgroup metric. For every n > 1, let S n be the symmetric group of permutations of [n]. 
In [18], Reidys and Stadler associated to every RNA secondary structure T € S n the subgroup 
G(r) of S n generated by the set of the transpositions corresponding to the contacts in T: 

G(T) = ({(i,j)\i-jeQ}). 

They also proved that the mapping V i— > G(T) is an embedding of S n into the set Sub(S'„) of 
subgroups of S n , and they used this representation of RNA secondary structures as permutation 
subgroups to define the following subgroup metric: 



d S g r . S n x S n 



(Ti.ra) ^ In 



G(r!)-G(r 2 ) 
G(r!)nG(r 2 ) 



In [19] it was proved that this metric simply measures, up to a scalar factor, the cardinal 
IQ1AQ2I of the symmetric difference of the sets of contacts. 

Unfortunately, if we extend the mapping G to the set C n of all contact structures of length 
n, we no longer obtain an embedding into Sub(S' n ), as the following easy example shows. 

Example 1 Let F\ — ([5], Q\) and T 2 = ([5], Q2) be contact structures with sets of contacts 

Qi = {1-3,3-5}, Q 2 = {1-5,3-5}. 
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see Fig. 1. Then G{T{) = ((1,3), (3, 5)} and G(T 2 ) = ((1, 5), (3, 5)) are both equal to 

{Id, (1,3), (1,5), (3, 5), (1,3, 5), (1,5, 3)}. 




This entails in particular that the subgroup metric, when extended to the set C n , yields only 
a pseudodistance: it is nonnegative and symmetric and satisfies the triangular inequality, but 
d S gr (I\, r 2 ) = does not imply T± = T 2 . The scope of the failure of the separability condition 
is determined by the following result. 

Proposition 2 For every ri,T 2 G C n , d sgr (Ti,T 2 ) = if and only if for every i-j G Ti there 
exists a chain of contacts k\-k 2 , k 2 -k^, ■ • • , k m - 2 -k m -i, k m -\-k m in T 2 with m ^ 2, k\ = i and 
k m =3, and vice versa, for every i-j G T 2 there is a similar chain of contacts in Ti going from 
i to j. 

Proof. By [18, Thm. 5], c^Ti, T 2 ) = if and only if G(Ti) = G(T 2 ), i.e., if and only if every 
transposition corresponding to a contact in Ti is a product of transpositions corresponding 
to contacts in T 2 , and vice versa, every transposition corresponding to a contact in T 2 is a 
product of transpositions corresponding to contacts in T\, a condition that is equivalent to the 
one given in the statement. ■ 

Orbits. Rcidys and Stadler also represented an RNA secondary structure T G S n with set of 
contacts Q = {ii-ji, . . . ,ik-jk} by the involution 1 

k 

<r) = \[{H,3t)eS n . 

t=i 

They also proved that this construction yields and embedding 7r : S n S n , which they used 
to induce metrics on <S„ from metrics on S n [12, 18]. 

For every Ti,r 2 G S n , let D{T 1 ,T 2 ) = ir(T 2 )) G Sub(S n ) be the dihedral subgroup 

of S n generated by the involutions associated to them. This subgroup acts on [n\. The orbits 
induced by this action can be understood as subsets {ii, i 2 , . . . , i m } C [n], m > 1, such that 

ii-i2,i2-h, ■ ■ ■ ,im-\-im G Q\ U Q 2 

1 Notice that this product is only well-defined if the transpositions appearing in it commute with each other, 
and thus this definition does not make sense for arbitrary contact structures, at least unless some convention is 
introduced on the order how these transpositions must be composed; we shall not consider this problem here. 
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and maximal with this property, i.e., such that any other contact in Qi U Q2 involving some 
element of this subset can only be i\-i m . Notice that these orbits are exactly the connected 
components of the graph Ti U T 2 . The unique bonds condition (or, in group-theoretical terms, 
the fact that 7r(ri) and 7r(r2) are involutions) implies that if {11,^2, ■ • ■ ,i m } is such an orbit, 
then «i-i2,*3-«4, • ■ • belong to one of the sets Q\ or Q 2 and « 2 - «3,«4 - «5, ■ • ■ belong to the other 
one. 

Such an orbit is cyclic if m — 2 and 6 Q\ C\Q-2i or m ^ 3 and i\-i m G Q\ UQ2, and it is 
linear in all other cases: see Fig. 2. We shall call the cardinal of an orbit its length. The length 
of a cyclic orbit is always even: if i\-i2 € Q\ in a cyclic orbit {«i,«2, ■ • ■ ,i m }, then h-i m £ Q2 
and hence i m -\-i m S Qi- 




Figure 2: A cyclic orbit of length m (a) and a linear orbit of length m (b). 

An orbit is trivial when it is a singleton: it is a linear orbit consisting of a node that it 
is isolated in both Ti and T 2 . If {ii, 12, ■ ■ ■ , i m } is a non-trivial linear orbit with i\-i2,i2' 
e Qi U Q 2 and i\ -i m £ Q\ U Q2, the n i\ 1 i ra are its end points. 

We shall say that a contact i-j <E Q1UQ2 is involved in an orbit when its vertices i, j belong 
to this orbit. Every contact in Q\ U Q2 is involved in one and only one orbit, and a contact 
belongs to Q1AQ2 if and only if it is involved in a linear orbit or in a cyclic orbit of length 
m > 2. 

Let, for every k ^ 2, 

A {m \ A> fc , e (m) 

denote, respectively, the number of linear orbits of length m, the number of linear orbits of 
length m ^ k and the number of cyclic orbits of length m induced by the action of D(Ti, T 2 ) 
on [n]. Since a cyclic orbit of length m involves m contacts, and a linear orbit of length m 
involves m — 1 contacts, we have that 

IQiAQal = ]T m6 (m) + ^ (m - l)A< m >. 

m>4 m>2 

3 A family of metrics based on edge ideals 

Let n be from now on an integer greater than 2. Let M.{x\, . . . , x n ), or simply (x) , be the set 
of all monomials in the variables x\, . . . , x n . We shall denote a monomial x" 1 • • • x" n G M(x) 
by ^("i.— ,<*«) or simply by x— if we let a stand for the n-tuple (ai, . . . , a„). The foiaZ degree 
of a monomial ^("i.— .«n) j s ^" =1 ai. For every m ^ 0, 
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• let A4(x)^ be the set of all monomials in M(x) of total degree m, and 

• let M(x) m be the set of all monomials in M(x) of total degree ^ m. 
Recall that 

\M(x_ U =( n+ n m ) andlM^l^^r 1 )- 

Let F 2 be the field Z/2Z and F 2 [xi, . . . ,x n ], or simply F 2 [x], the ring of polynomials in the 
variables xi,...,x n with coefficients in F 2 . Let Id(¥ 2 [x\) denote the set of ideals of F 2 [x]. For 
every / e 7d(F 2 [s]) and for every m ^ 0, 

• let M(I) = I n M(x) be the set of all monomials that belong to /; 

• let M(/)( m ) = I n _M(x)( m ) be the set of all monomials of total degree m that belong 
to /; 

• let M(I) m = I n M(x) m be the set of all monomials of total degree < m that belong 
to /; 

• let C(I) = A4(x) — M(I) be the set of all monomials that do not belong to /; and 

• let C(I) m = C(I) n M(x) m be the set of all monomials of total degree < m that do not 
belong to /. 

An ideal / of F 2 [x\ is monomial when it is generated by a set of monomials, ft should be 
recalled that, given a monomial ideal / generated by a set of monomials M, the monomials in 
M(I) are exactly those that are divisible by some monomial in M and the polynomials in / are 
exactly the linear combinations (with coefficients in F 2 ) of monomials in M(I); in particular, 
for every two monomial ideals / and J of F 2 [x], / = J if and only if M(I) = M(J). 

Definition 2 For every T = ([n],Q) 6 C n , the edge ideal Ir of T is the monomial ideal of 
F 2 [ic] generated by the products of pairs of variables whose indexes form a contact in T: 

I r = ({xiXj \i-j € Q}). 

Proposition 3 The mapping I : C n — > W(F 2 [x]) that sends every T £ C n to its edge ideal, is 
an embedding. 

Proof. For every T G C„, the monomials in Ir are exactly those divisible by some XiXj with 
i-j G Q. This implies that 

M(i r ) (2) ={^|i'jeQ}, 
and therefore T is uniquely determined by M(Ir)^ ■ ■ 
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Given two contact structures Ti,T 2 G C n , it is clear that 

+ h 2 = {{xiXj | i-j G Qi} U {xiXj \i-j G Q 2 }) = 7 ri ur 2 - 
As far as Iri H 7r 2 S ocs j it is straightforward to prove that it is generated by 

{xix 3 | i-j e Qi r\Q 2 } u {xiXjXk | «-j g Qi - Qz,j-k eQ 2 ~ Qi} 

U{xiXjXkXi | i-j G Qi - Q 2 ,k-l G Q2 - Qi, {«, j} n {fc, /} 7^ 0}. 

Using a construction similar to the one introduced by Reidys and Stadler for subgroups, we 
want to measure the difference between two contact structures ri,r 2 G C n by means of the 
quotient (I-p 1 +It 2 )/{It 1 H Jr 2 ). Notice that this quotient is a singleton if and only if Ir ± = Ir 2 , 
i.e., if and only if T\ — T 2 . Unfortunately, in all other cases this quotient is infinite: if a 
monomial x— belongs to, say, 7n — Ir 2 7 then all its powers define pairwise different equivalence 
classes modulo Ir 1 H Ir 2 Thus, to obtain a "finite distance" we move to quotients of F2[aJ. 

For every n 5? 1 and to 3, let us consider the quotient ring 

R n , m =¥ 2 [x u ...,x n ]/{M(x)^), 

and let 7r m : ¥ 2 [xi,. . . ,x n ] — > R n ,m be the corresponding quotient ring homomorphism. For 
every I G Id{¥ 2 [x\, . . . , x n ]), let ir m (I) be the image of I in R n , m - 

Proposition 4 For every to ^ 3, the mapping d' m : C n x C n — > K defined by 



(C(ri,r 2 ) = i og2 



TTm^rJ n7r m (/r 2 ) 



is a metric on C n . 



Proof. When we perform the quotient i?„ !m = F 2 [x]/(.M(x)( m )), all monomials with total 
degree greater or equal than m are cancelled. Then, each element in R n ^ m has a unique 
representative that is a linear combination with coefficients in ¥ 2 of monomials of total degree 
at most to — 1. Since ¥ 2 is a finite field, this implies that R n , m is finite, and in particular it is 
a finite commutative group with the sum of quotient classes of polynomials as operation. Let 
Sub(R n!m ) denote its set of subgroups. 

On the other hand, since to ^ 3, the quotient homomorphism 7r m does not identify any 
monomial of total degree 2 with any other monomial. Thus, 7r TO (7ri) = ^m(Ir 2 ) implies 
M(7n)^ = M{It 2 )^ and hence, as we saw in Proposition 3, Ti = T 2 . In other words, the 
mapping 7r TO o I : C n — > Sub{R n ^ m ) sending every r G C n to 7T TO (7r), is an embedding. 

Then, since by [18, Thm. 5] the mapping 



*(/, J)=log 2 



I + J 



in J 



I, J G Sub(R n , m ) 



is a metric on Sub{R n ^ m ), the mapping 

d' m {T u V 2 ) = *(7r ro (7 ri ),7r ro (7r 2 )), r l5 r 2 G C n 
is a metric on C n , as we claimed. 
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We have used log 2 instead of In in the definition of d' m in order to avoid unnecessary scalar 
factors: cf. [19, Prop. 4]. 

These metrics d' m have a simple description in terms of symmetric differences of sets of 
monomials. 

Proposition 5 For every m ^ 3 and for every T\, T 2 G C n , 

<4(ri,r 2 ) - |M(7 ri ) ro _iAM(7r 2 ) ro _i|. 
Proof. Notice that, for every I 6 Id(¥ 2 [x\), 

Then, for every Ti, T 2 G C n , 

7r m (J ri )+7r m (Jr 2 ) = 7r m (J ri + (M(x)™)) + n m (I r2 + 
7r m (/ ri )n^ m (/r 2 ) " 7r m (/ ri + <-Mfe) (m) )) ri7r m (/r 2 + (.M(x)( m ))) 
_ 7r m ((/ ri + A4fe) (m) )) + (7r 2 + <A%) (m) })) 

~ 7r m ((/ ri + (A%) (m) )) n (7r 2 + <A%) (m) ») 
~ (7 Fl + (MlQ) + + (A4(x)| m >)) 
(7 Fl + (.M(x)^)) n (7r 2 + (A%) (m) » 
_ ^r 1 +7r 2 + (A^fe) M ) 

(7 ri n7 r2 ) + <A4(aD (m) }' 

where the equality 

(7 Fl + (A%) (m) )) n (7 r2 + {M(x}^)) = (7 Fl n 7 r2 ) + (A%) (m) ) 

used in the last step holds because 7r l; 7r 2 and {M{x)^) are monomial ideals. 
To simplify the notations, set 

J =7 ri +7r 2 + (A%) (m) }, 
K =(lT i nlr 2 ) + (M(x)^}, 

so that 

(7 ri )+7r m (7r 2 ) _ J 
7i"m (7r x ) n7T m (7 r2 ) 
These ideals J and 7l are also monomial and 

M(J) = M(7 ri ) ro _i UM(7r 2 ) m _i UU r>m _i VWfe) (r) , 
M(7C) - (M(7 ri ) m _! n M(7r 2 ) m _!) U \J r>m -i M{x)W. 

A polynomial belongs to J (resp. to K) if and only if it is a linear combination, with coefficients 
in F 2 , of elements of M{J) (resp. of M{K)). This implies that every quotient class in J/K has 
a unique representative of the form ^2 x a eMo x ~ ^ or some finite subset M of M( J) — M(7T) 
(the zero class corresponds to M = 0). Since 

M(J)-M(K) - (M(7 ri ) ro _ 1 UM(7r 2 ) m _ 1 )-(M(7 ri ) m _ 1 nM(7r 2 ) ro _ 1 ) 
= M(7 ri ) ro _ 1 AM(7r 2 ) ro _ 1 
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is a finite set, this implies that 



TTm (iri ) + 7I"m (7r 2 ) 




J 


TmUrJ n7r m (/ r2 ) 




K 



= 2 |M(/r 1 ) m -iAM(/r 2 ) m -i| j 

Tm^rJ n7r m (ir 2 J ^ 

as we claimed. ■ 

Proposition 5 allows us to express the metrics d' m in terms of Hilbert functions. For every 
monomial ideal 7 of F 2 [x] and for every to > 0, let 77/ : N — > N be the mapping defined by 

H I (m) = \C(I) m \, togN; 

i.e., Hi(m) is the number of monomials of total degree ^ to that do not belong to I. This 
mapping is called the (afiine) Hilbert function of I. It can be computed explicitly from a given 
finite set of generators of I [2]; actually, several freely available computer algebra systems like, 
for instance, CoCoA [3] or Macaulay [11], compute Hilbert functions. 

For every contact structure T e C n , let Hr denote the Hilbert function of its edge ideal. 

Corollary 6 For every Fi, T 2 G C n and for every to ^ 3, 

d' m (T u T 2 ) = H Tl (m - 1) + H r2 (m - 1) - 2tf ri ur 2 (m - 1). 

Proof. We have that 

M(7 ri )m-iAM(7r 2 )m-i 

= (M(J ri ) m _i - (M(J ri ) m _i n M(7r 2 )m-i)) U (M(7r 2 ) ro _i - (M(J ri ) m _i n M(7r 2 ) ro _i)) 
= (M(7 Fl +7r 2 ) m _i - M(7r 2 ) ro _i) U (M(7 Fl + Ir 2 ) m -i - M(J ri ) m _i) 
= (C(7r 2 ) ro _i - C(7 Fl +7r 2 ) m _i) U (C(/ ri ) m _i - C(7 Fl + 7r 2 ) m _i) 

and thus, this union being disjoint, 

|M(7 ri ) ro _iAM(7r 2 ) ro _i| 

= (|C(7r 2 ) ro _i| - |C(7 ri +7r 2 ) ro _i|) + (|C(7 ri ) ro _i| - \C(I Tl + Ir 2 ) m -i\) 
= (77r 2 (m - 1) - i7 ri ur 2 (m - 1)) + (77 Fl (to - 1) - i7 ri ur 2 (m - 1)), 

as we claimed. ■ 

To close this section, we want to point out that the metrics d' m grow with n and m, and 
thus it is convenient to normalize them in order to avoid unnecessarily high figures. More 
specifically, let Tq = Qn],0) be the empty RNA secondary structure of length n and let T\ 
be an RNA secondary structure of length n with only one contact, say i-j with i < j. Then 
7r = {0}, 7r x = 7r + 7r x = (xiXj), and therefore, for every to ^ 3, 

eC(r ,ri) = Hr a (m - 1) - 77 ri (m - 1), 

where 

77 ro (m-l) =|M(x) m _ 1 | = ("+™- 1 ) 

77 ri (m - 1) = \M{x 1: . . .,Xi-x,x i+ x, . . . ,x„) m _i| 

+\M(xi, . . .,Xj-i,Xj+i, . . .,x n ) m -x\ 

(Xl , . . . , X{ — 1 , Xi-\- \ j . . . , Xj — \ , Xj-\-l , . . . , X n ) m _ 1 | 

rj/n+m — 2\ /n+m— 3\ 

~~ Z V n-1 / V n-2 )•> 
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d 3 = d' 3 , d 4 = ——rd' 4 , d 5 = d' 5 , and so on. 



and hence 

tow - -i" + ™; 2 ) + (-r 2 - 3 ) - (" + r 3 

If we take 1 as the "natural" value for the distance between T and Ti, then instead of using 
the metrics d' m on C n , we must divide them by (™ +r ™ -3 )- 

Definition 3 For every m ^ 3, the edge ideal mth metric on C n is 

dm(ri,r 2 ) = ^ n+m _ 3 ^ d' m (v 1 ,r 2 ), r 1 ,r 2 e c„. 

So, for instance, on C n 

n —d' 4 , d 5 = J^-2y 

Even after this modification, the metric d m is sensitive to n, in the sense that if we add to 
two contact structures of a given length n an isolated point, making them contact structures 
of length n + 1, then their distance d m (for m > 4: see Proposition 7 below) may grow. For 
instance, let r be again the empty RNA secondary structure of length n > 6 and let now 
ri = ([n], {1-3,4-6}). Then 

<Wr ,ri) = -p^3y(^r„(m - l) - H Tl (m - 1)). 

We have seen above that Hr (m — 1) = (™ + ™~ ), and we shall see in Proposition 14 below 
that (with the convention that (*) = if i < n) 

(n + m-l\ /n + m-3\ fn + m-5 



Therefore, 



(m-3)(m-4) 



r> /n+m— 3\ / (n+m — 5\ 9 

~ A \ n )/\ n )~ Z (n+m-3)(n+m-4) 

which increases with n if m > 5. 

Since we are only interested in comparing contact structures of the same length, this sensi- 
tiveness of the edge ideal metrics to the length n is not a major drawback. 

4 Some computations 

In this section we shall compute explicitly some edge ideal mth metrics on C n and <S„, for low 
values of m. We begin with m = 3. 

Proposition 7 For every ri,T 2 G C n , 

ds(ri,r 2 ) = |QiAQ 2 |. 

Proo/. Notice that M(7 r )i = for every r e C n . Therefore 

M(7 ri ) 2 AM(7r 2 ) 2 - M(/ ri ) (2) AM(/r 2 ) (2) = {x iXj \ i-j e (Qi - Q 2 ) U (Q 2 - Q x )} 
and hence |M (in) 2 AM(ir 2 )2 = |QiAQ 2 |- ■ 
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Actually, it is not difficult to prove that, for every T € S n , the mapping G(T) — > TT3(Ir) send- 
ing every permutation a — ■ ■ ■ £ G(T), with iyji, . . . , irji <E Q, to the equivalence 
class of the polynomial x il xj 1 + • • • + x il xj l e 7r modulo (Al(x)( m )}, is an isomorphism of 
groups, considering ^(ir) as a subgroup of R n ^. This is not true for arbitrary contact struc- 
tures, because in this case G(T) need not be commutative, while n 3 (Ir) is always so. Therefore, 
the embedding 7r 3 o I : C n Sub(R n ^) generalizes the embedding G : S n Sub(S n ), and 
hence the metric ds generalizes (up to a scalar factor) the subgroup metric d sgr at a level 
deeper than their raw value. 

The edge ideal mth metrics for m > 3 have a much more involved expression. In their 
computation we shall use the following lemma; notice that the edge ideals of contact structures 
are radical monomial proper (i.e., ^= F 2 [x]) ideals. 

Lemma 8 Let I be a radical monomial proper ideal o/F 2 [x] and, for every k ^ 1, let SFk(I) 
be the number of square free monomials of total degree k belonging to M(I). Then, for every 
m ^ 0, 

*H»> = (":i-£(TW). 



n 



^ \ k 



Proof. If 7 is a radical monomial ideal, then a monomial of the form • • • x"* k , with 
ii,...,ik pairwise different and each cti t > 1, belongs to M(7) if and only if the corresponding 
square free monomial Xi 1 ■ ■ ■ Xi k belongs to M(7). Therefore, each one of the SFf,(I) square 
free monomials Xi 1 ■ ■ ■ Xi k of total degree k > 1 in M(7) adds as many monomials x^ 1 ■ ■ ■ x i ^ k 

to M(I) m as vectors (a^ ,cti k ) E (N — {0}) fc such that J2t=i a n ^ 171 there exist, and the 
number of the latter is ( k+ ™~ k ) — (™). Since all monomials in M(I) m added in this way are 
pairwise different and 1 ^ I by assumption, this proves that 



\M(I) m \=J2^ k )SF k (I), 

k=l 



in 



and hence 



\C(I) m \ = \M(x) m \ \M(I) m \ = ( n + m ) - J2 (fysFkW, 
as we claimed. 

Notice that if T € C„, then SFi{I T ) = and SF 2 {h) = \Q\- 

Let us compute now di on C n - For every contact structure T G C n , let 



A(T) 
T(T) 



'{H,3-k}CQ\j?k} 
{i,j,k} C [n] | i-j,j-k,i-k 6 q|| 



In other words, ^4(r) and T(T) are respectively the numbers of angles and triangles in F. 
Notice that each triangle contains three different angles and therefore 3T(r) ^ ^4(r). 

Proposition 9 For every ri,r 2 G C n , 

d4(Ti,r 2 )= IQ1AQ2I 

— 1 —(2A(T 1 u r 2 ) - A(ri) - A(r 2 ) + 2T{v 1 u r 2 ) - r(Ti) - r(r 2 )) 

n + 1 V / 
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Proof. For every V = ([n], Q) G C n we have that 

flr(3) = ( n ^ 3 ) - 3Sfi(/r) - 3SF 2 (/ r ) - 5F 3 (/ r ), 

where SF\{I T ) = and SF 2 (I T ) = \Q\- It remains to compute SF 3 (I r ): 

(1) For every i-j G Q, there are (n — 2) square free monomials x^XjXj^ in. 

M(7r): this makes 

(n — 2)|Q| such monomials. 

(2) Now, if i-j, j-k G Q form an angle, the monomial XiXjXk was counted twice in (1): 
therefore, to count these monomials only once, we must subtract A(T). 

(3) Finally, if the nodes i,j, k form a triangle in T, then the monomial X%X jXfc WciS counted 
three times in (1) and it was subtracted three times in (2); therefore, to retrieve these 
monomials, we must add T(T) again. 

Therefore 

SF 3 (I r ) = (n-2)\Q\-A(T)+T(T) 

and 

#r(3) = ( H + 3 ) - (n + 1)|0| + A(T) T(T). 



n 



We have then 



d' 4 {T 1 ,T 2 ) = H Tl (3) +H T2 (3) - 2ff ri ur 2 (3) 

= ( n + 3 ) (n + l)\Q t \ + A(Ti) T(rO + ("+ 3 ) - (n + 1)|Q 2 | + A(T 2 ) - T(T 2 ) 

-2(("+ 3 ) - (n + l)|Qi U Q 2 \ + A(Tx U T 2 ) - T(r x U T 2 )) 
= (n + l)|QiAQ 2 | - (2A(Ti U T 2 ) - A(Ti) - A(T 2 ) + 2T(T 1 U T 2 ) - T(Ti) - T(T 2 )); 

in the last equality we have used that 2\Qi U Q 2 \ — \Qi\ — \Q 2 \ = |QiAQ 2 |. 

Dividing by n + 1 this last expression for d' 4 (Ti,r 2 ), we obtain the expression for d 4 (ri, r 2 ) 
given in the statement. ■ 

A simple computation shows that, for every Ti, T 2 G C n , 

2A(riur 2 )-A(r 1 )-A(r 2 ) 

= \ i^k, (i-j, j-k G Q s ) and (i-j or j-k (£ Q t ), for some {s,t} = {!, 2 }}| 

+2^{i-j,j-k} | i-j G Qi-Q 2 ,j-k g Q 2 -Qi}| 

2T(r 1 ur 2 )-T(r 1 )-r(r 2 ) 

= j, k} | (i-j,j-k,i-k G Q a ) and [i-j, j-k or i-k £ Q t ), for some {s,t} = {1. 2 }|| 
+2||{i, j, k} | i-j, j-k G and i-k € Q t — Q s , for some {s, i} = {1, 2} j | 

Example 10 Let r = ([9], {1-3,4-6}), and consider the following "modifications" of it: 

T, = ([9], {1-3,4-6,7-9}), T 2 = ([9], {1-3, 4-6, 6-9}), T 3 = ([9], {1-3, 4-6, 1-6}) 
T 4 = ([9], {1-3,4-7}), T 5 = ([9], {1-3,3-6}), T 6 = ([9], {1-3,3-5}), T 7 = ([9], {1-3, 5-7}) 
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The contact structures Ti, r 2 and r 3 are obtained by adding a contact to Tq in three different 
ways, T4 and are obtained by shifting the contact 4-6 in two different ways, and T$ and 
r 7 are obtained by displacing this contact in two more ways. Notice that r ,ri,r 4 and r 7 are 
RNA secondary structures, but not the others. 
We have that 

d3(r 0) ri) = rf 3 (r ,r 2 ) = d 3 (r ,r 3 ) = i, 
d 3 (r ,T 4 ) = rf 3 (r ,r 5 ) = d 3 (r ,r 6 ) - d 3 (r ,r 7 ) = 2, 

while 

d 4 (T ,T 1 ) = i, d 4 (r ,r 2 ) = o.9, rf 4 (r ,r 3 ) = 0.8, 
d 4 (r ,r 4 ) = i.8, d 4 (r ,r 5 ) = 1.7, rf 4 (r ,r 6 ) = 1.9, d 4 (r ,r 7 ) = 2 

The expression for d 4 on RNA secondary structures is much simpler. Recall that, for every 
Ti, r 2 E S n , A>2 stands for the number of linear orbits of length 2, i.e., of non-trivial linear 
orbits induced by the action of D(Ti, T 2 ) on [n]. 

Proposition 11 For every 14, T 2 E S n , 

d i {T 1 ,Y 2 ) = \Q 1 AQ 2 \-^ I A(T 1 UT 2 ) 

- |QiAQ 2 |-^ T (|Q 1 AQ 2 |-A> 2 ). 

Proof. Notice that the unique bonds condition implies in this case that 

A(ro = A(r 2 ) = r(ri) = T(r 2 ) = t(t 1 u r 2 ) = o, 

from which the first equality follows. It remains to prove that if Ti, T 2 E SE, then 

A(r 1 ur 2 ) = |g 1 AQ 2 |-A> 2 . 

To prove it, notice that if {i-j,j-k} forms an angle in Ti U T 2 , then, again by the unique bonds 
condition, one these contacts must belong to Q\ — Q 2 and the other one to Q2 — Qi, and the 
nodes k belong to the same orbit of length at least 3. Now, each cyclic orbit of length 
m > 2 contains m such pairs of contacts, while any linear orbit of length m > 2 contains m — 2 
such pairs. Then 

Ap! u r 2 ) = £ m > 4 mew + E m > 2 (™ - 2)AM 

= E m > 4 ™@ (m) + E m > 2 (™ - i)A (m) - E m > 2 ^ {m) = IQ1AQ2I - a> 2 , 

as we wanted to prove. ■ 

Therefore, on S n , the metric <i 4 increases with the cardinal of QiA<5 2 , but decreases with 
the number of pairs of contacts in QiAQ 2 that share a node. Notice moreover that 

< IQ1AQ2I - A> 2 < |QiAQ 2 |; 

the lower bound is achieved when all non-trivial orbits are linear of length 2 (i.e., when r 4 ur 2 
is again an RNA secondary structure), and the upper bound when all non- trivial orbits are 
cyclic (i.e., when T\ and T 2 have exactly the same isolated nodes). 
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Example 12 According to [22, $5.3.2], most RNA All-sequences of length 16 do not fold at 
all, i.e., they form the empty RNA secondary structure Tq = ([16], 0). But some of these 
sequences (about a 3% of them) do fold, forming one of the following three RNA secondary 
structures without pseudoknots: 

r x : .((((((...)))))) r 2 :((((((...)))))). r 3 : ((((((....)))))) 

Recall that these are the bracket representations of 

Ti = ([16], {2-16, 3-15, 4-14,5-13, 6-12, 7-11}' 
T 2 = H16], {1-15, 2-14,3-13, 4-12, 5-11, 6-10} 
T 3 = ([16], {1-16, 2-15, 3-14,4-13, 5-12,6-11} 

They have pairwise disjoint sets of contacts, and hence 

d 3 (Ti,r 2 ) = d3(Ti,r 3 ) = rf 3 (r 2 ,r 3 ) = 12. 

But 

1 84 1 82 

d4(ri,r 2 ) = — , d4(ri,r 3 ) = d 4 (r 2 ,r 3 ) = — , 

which shows that, under d±, T\ and T 2 are closer to T 3 than to each other. 

Example 13 Let us consider a hairpin with an interior loop Tq e S21 with bracket represen- 
tation 

((•((((-)))) )) 

One possible rearrangement of this secondary structure splits the hairpin into a multibranched 
loop by means of two shift moves, yielding a multibranched structure: 

To : ((-((((-)))) )) : ((..(((...)))(...)-)) => T 2 : ((...((...))((...)))) 

Another possible rearrangement through shift moves widens the final loop of the hairpin by 
moving a one-nucleotide bulge: 

r : ((-((((-)))) )) r; : (((.(((...)))) )) => r 2 : ((((.((...)))) )) 

=► :(((((.(...)))) ))=fTi :((((((....)))) )) 



Now notice that 



But, although 



it turns out that 



d 3 (ro,r 1 ) = d 3 (ro,ri) = 2, rf 3 (r ,r 2 ) = d 3 (r ,r' 2 ) = 4. 



d 4 (ro,r 1 ) = d 4 (r ,ri) = ^ ) 

42 41 

d 4 (r ,r 2 ) = — , d 4 (r ,r' 2 ) = — . 



Therefore, under d&, T' 2 is closer to Tq than T 2 . 
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As m grows, the description of d m on C n gets more and more involved or, if we want it to 
remain simple, more and more uninformative. The same happens on <S n , but at a lower pace. 
Therefore, from now on, we shall only consider edge metrics on RNA secondary structures. 

We have a closed formula for the Hilbert function of the edge ideal of an RNA secondary 
structure, given by the following result, which we consider interesting in itself. In it we use the 
convention that ([J) = 1 and (°) = if j > 0. 

Proposition 14 For every T = ([n], Q) € S n and for every m ^ 0, 

|m/2J 



Proof. To begin with, notice that the Hilbert function Hp only depends on \Q\ because, for 
every two RNA secondary structures Ti, T2 with the same number of contacts, their edge ideals 
are the same up to a permutation of the variables and thus |C(/ri)m| = \C(Iv 2 )m\ for every 
m ^ 0. For every k ^ 0, let H k denote the Hilbert function of the edge ideal of any T 6 S n 
with \Q\ = k. 

Now, notice that if T = ([n], Q) £ S n with Q = {ii-ji, ■ ■ ■ , ik'jk}, then no monomial Xi t Xj t , 
fori=l,...,fc, is a zero divisor modulo the ideal (xi 1 xj 1 , . . . , Xi t l x j t l ) . This implies, by [6, 
§9.4, Cor. 5] (or, rather, its proof) that 

Hk+i(m) = Hk(m) — Hk{m — 2), for every k ^ 0, m ^ 2, 

and hence 

k 

Hk+i(m) = Ho(m) — Hi(m — 2), for every k > 0, m ^ 2; 
i=o 

we shall use this recursion to prove the expression in the statement by induction on m. 
To begin with, we know that 

/ fi -J- fyx\ 

H (m) = \M(x) m \ = I ^ J , for every m ^ 0, 

which clearly satisfies the expression in the statement (with \Q\ = 0). Moreover, 

H k (0) = 1, H k (l) = n + 1 for every k ^ 0, 
because, for every r e S n , 

C(/ r )o = {l}, C(/ r )i = {l,a:i,...,a; n }. 

These values for Hk(0) and Hk(l) clearly satisfy the expression given in the statement. Now, 
as induction hypothesis, assume that 



L m o/2j /;\ / , o -\ 

H k (m ) = g (-iy Q + 7 - 2J ) for every * > 0. 



1G 



Then, for every k ^ 0, 

H k+1 (m + 2) = H (m + 2) - £-=o tf;(m ) 



= r ?+ 2 ) + Ei r i /2J '(-°i) j+1 Hr 2j ) (eLo 0) 

_ fn+m a +2^ _j_ ^|_m /2J /n+m -2j^fe+l^ 

= (_l)0^fe+l^n+m +2j _|_ ^ |.mo/2J + 1 ^_ j^j ^fe+l^„+ TOo +2-2j^ 

_ ^L(mo+2)/2j (_iy ( k+1 ^ (n+m +2-2j^ 



where the second equality uses the induction hypothesis and the fourth equality uses that 

£L,(J) = (Si)- 

Thus, for instance, for every T = ([n], Q) € «S„, 

/r /r (2) =("+ 2 )-IQI 

ff Ir (3) =("+ 3 )-(n + l)|Q| 

^ r (4) =rr)-(T)M +C? 1 ) 

^Ir(5) =rf)-("f)|Q| + (n+l)( l f l ) 

^ r (6) =rr)-rr)iQi+ me? 1 ) -c? 1 ) 

Unfortunately, we do not have a similar explicit expression for the Hilbert function of arbi- 
trary contact structures, including unions of RNA secondary structures, and then we still use 
Lemma 8 to compute the Hilbert functions of the latter. 

To close this paper, we shall provide explicit descriptions of d$ and d§ on the set S n just to 
grasp what they measure. Their proofs are simple, but long and technically involved, and we 
delay them until the Appendix at the end of this paper. 

Proposition 15 For every ri,r 2 6 S n , 

4(ri,r 2 ) = IQ1AQ2I 

-{4*y(2(n - l)(|QiAQ 2 | - A> 2 ) + 2(1^1) - (^) - (1*1) + 2(A> 3 + 0W)) 
Example 16 Consider the RNA secondary structures of length 15 

V, : (.).(•)■(•)■(•). T 2 :....(.(.).(.).), T 3 : ..(.).(.).(.).. 



Then 
and 

but 



d 3 (r 1 ,r 2 ) = d 3 (r 1 ,r 3 ) = 7 

25 

cLt(ri,r 2 ) = d 4 (ri,r 3 ) = — , 



7i fiq 

d 5 (ri,r 2 ) = 5 + — , d 6 (r 1 ,r 3 ) = 5 + — . 



Thus, under d§, T 3 is closer to T\ than to T 2 . 
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It is interesting to observe that, contrary to what happens with ds and d±, the term 
2 (IQiUQ 2 |) _ (|Qi|) _ (|Q 2 |) makcs the yalue of d 5 {T 1 ,T 2 ) depend not only on the cardinal 
and structure of the set Q1AQ2, but also on \Qi n Q2 1 - For instance, it is not difficult to check 
that if Ti, r 2 , r' 1; G S n are such that Q\ — Q 2 = Q[ — Q' 2 an d Q2 — Qi = Q'2 — Qi, then 

d 5 (ri,r 2 ) <d 6 (ri,r^) |Qi nQ 2 | > IQi n Q 2 | ; 

i.e., the greater the set of contacts they share is, the closer they are. 

Example 17 Consider again the hairpin with an interior loop Tq and its rearrangement T\ 
given in Example 13 

To :((.((((.••)))) )), Ti :((..(((...)))(...).))• 

Let now Tq and T'^ be the RNA secondary structures of the same length 21 obtained by removing 
from T and I\ their outer stacked pair of contacts: 

n :•••((((••■)))) , r; :....(((...)))(...).... 

Since QiAQ 2 = Q[AQ 2 = {4-14, 14-18}, we have that 

^(To.rx) - ds^o.ri) = 2, ^(To.ro - d^ri) = ^ 

-Bwi terns oui i/iai 

201 205 

d 5 (r ,r 1 ) = i + — , d 5 (r^,r' 1 ) = i + — . 

As far as c? 6 goes, we have the following result. 
Proposition 18 For every ri,r 2 G <S„, 

de(ri,r 2 ) = IQ1AQ2I - + i)(2( IQl ^ 21 ) - ( IQ 2 l1 ) - ( IQ 2 21 )) 

+2((£) + 3 - |Qi U Q 2 |)(|QiAQ 2 - A> 2 ) - 2(n - 1)A> 3 + 2A> 4 + 2(n - 3)6( 4 )) 

In a similar way, an explicit expression for d m on <S„ can be obtained for every m ^ 7, yielding 
information about what these metrics measure: recall moreover that, for specific ri,r 2 G S n , 
the value of d m (ri,r 2 ) can be easily computed using a suitable computer algebra system. 
Unfortunately, we have not been able to produce a closed expression for all these metrics. 
Notice that, when finding an expression for d m , the only new ingredient that is necessary 
to determine is the coefficient SF m ^i(Ir 1 U Ir 2 ), which can be done for each m by counting 
carefully how many square-free monomials of total degree m — 1 belong to I-p 1 U Ir 2 as we do 
in this paper for m = 4,5, 6. It is in this coefficient that new terms make their appearance 
in each d m : when one balances the number of square-free monomials in Ir 1 U Ir 2 of the form 
Xi 1 •••Xi m _ 1 such that i\ ■ i 2 , . . . , i m -i -im-i G Q\ U Q 2 , the number A( m ~ 2 ) makes its first 
appearance, and if m — 1 is even, then to counterbalance the number of square-free monomials 
x ii ' ' ' x i m -i m ^riu/r 2 such that . . . , i m -i} is a cyclic orbit, the number Q( m ~ 1 '> must be 
used for the first time (cf. the proofs of Propositions 15 and 18 in the Appendix). 
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5 Conclusion 



In the Discussion section of their paper [18], Reidys and Stadler, having pointed out that their 
group-based models and metrics cannot be used on arbitrary contact structures, ask "What 
if contacts are not unique as in the case of proteins?" Using edge ideals, we can represent 
arbitrary contact structures by means of monomial ideals of a polynomial ring, and we show 
that this representation generalizes the embedding of RNA secondary structures into the set 
of subgroups of S n proposed by Reidys and Stadler. We have used then this representation 
to define a family of edge ideal metrics on arbitrary contact structures, which can be easily 
computed using several freely available computer algebra systems, and we have studied their 
properties. 

Edge ideals are not the unique possible monomial ideal representations of arbitrary contact 
structures. For instance, we could associate to every contact structure T = ([n],Q) the clique 
ideal Jr of F2[xi, . . . , x n ] generated by the set of monomials consisting of one square-free 
monomial x^ ■ ■ ■ Xi k for each non-trivial clique (complete subgraph) {ii, . . . ,ik}, with k ^ 2, 
of T. Notice that if T is an RNA secondary structure, then Jr = Ir, but for arbitrary contact 
structures they can be different. For instance, if T = ([5], {1-3, 3-5, 1-5}), then 

Ir = (xix 3 , Xix 5 , x 3 x 5 } while J r = (xix 3 x 5 ). 

We see that the clique ideal Jr captures information on the clusters of monomers in three- 
dimensional structures (for instance, base triplets and quartets in RNA structures) in a way 
different to Ir ■ These ideals can be used to define new metrics on arbitrary contact structures 
of a fixed length similar to the edge ideal metrics introduced here. We shall report on them in 
a subsequent paper. 

Let us finally point out that another question of Reidys and Stadler's remains open for our 
models as well as, to our knowledge, for theirs: "Is there any hope for extending or altering 
any of the above concepts in order to incorporate variable sizes of structures?" 

Acknowledgments. We acknowledge with thanks X. Bordoy, J. Elias, J. Miro and G. Valientc 
for several discussions on the topic of this paper and for their comments on draft versions of 
it. 
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Appendix: Proof of Propositions 15 and 18 

To simplify the proofs, we establish first a lemma that we shall use several times and that generalizes 
the computation of A(Ti U I^) carried on in the proof of Proposition 11. 
For every Ti, r 2 £ <S„ and for every k 2, let 

M k = {{ii -12,12-43, • • • ,ifc-ifc+i} Qi U Qi | ii, . . . ,ik pairwise different}, 

and let Ak be its cardinal. Notice that A2 is equal to the number of angles A(Ti U Ir 2 ) in Ih U IV To 
simplify the notations, from now on we shall systematically write A2 instead of A(Ti U ir 2 )- 

Lemma A For every Ih, T2 £ S n and for every fc > 2, 

fe fe 
A h = |QiAQ 2 | - mQ(m) - E A >- 

m— 4 i — 2 

Proof. If {i\ -i2, , ■ ■ ■ , ik -ife+i} £ Mk, then the nodes i±, 12, ■ ■ ■ , ik+i belong to the same orbit, whose 
length will be at least k + 1. Therefore, every cyclic orbit of length m ^ k contributes no element to 
Mk, while every cyclic orbit of length m ^ k + 1 adds m new elements to it. On the other hand, each 
linear orbit of length m ^ k contributes no element to M k , while every linear orbit of length m > k+ 1 
adds m — k new elements to this set. 
This shows that 

Ak =£ m> fe™e(™>+£ m> fe(m 

= E m >fe^e^ + E m> fe(m 
= |Q 1 AQ 2 |-El =4 ^e^- 
= |Q 1 AQ 2 |-El =4 ^e^- 



-l)A^-(fc-l)E m> feA^ 

-E^ =2 (^-i)A (m) -(fc-i)E m> feA(' 

-ELA>i- 
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In particular, wc obtain again that A2 = IQ1AQ2 — A>2, as we already saw in the proof of 
Proposition 11. 

Proof of Proposition 15. To simplify the notations, we shall denote each SFi(Ir 1 ur 2 ) simply by SFi. 
We shall use the expression 

d6(ri,r 2 ) = — L-(tf ri (4) + ff r2 (4) -2ff riur2 (4)), 

I 2 ) 

where we already know that 

^r i (4) = r+ 4 )-r+ 2 )|Q < | + (^l), i = l,2 
Hr.uir, (4) = ( n + 4 ) - (4SFi + 6SF 2 + ASF, + SF 4 ) 

with 

SFi = 0, SF 2 = \Qi U Q 2 |, SF 3 = (n - 2)|Qi U Q 2 \ - A 2 ; 

the value of SF3 was obtained in the proof of Proposition 9: notice that T(Fi U T2) = and recall 
that A(Ti U r 2 ) = A 2 . It remains to compute SF4: 

(1) For every i-j G Qi U Q2, there are ("j 2 ) square free monomials XiXjXkXi € M(7r 1 ur 2 ). This 
makes ("^ )\Qi UQ2I such monomials. 

(2) Now, if i-j, k-l G Qi U Q2 with {i, j} n {k, 1} = 0, then the monomial XiXjXkXi is counted twice 
in (1). Therefore, we must subtract (l^ 1 ^ 2 ') — A 2 to the value given in (1). 

(3) If i-j, j-k G Qi U Q2 form an angle in Ti U T2, then for every I £ {i, j, k} the monomial XiXjXkXi 
is counted twice in (1). Thus, we must also subtract (n — 3)A 2 . 

(4) If i-j, j-k, k-l G Qi U Q2, with i,j, k, I pairwise different, then the monomial X%X^X\~X\ IS counted 
three times in (1), then it is subtracted once in (2) and it is subtracted twice in (3). Therefore, 
to retrieve these monomials, we must add A3. 

(5) Finally, if i-j, j-k, k-l, l-i G Qi U Q2, so that if {i, j, k, 1} form a cyclic orbit of length 4, then the 
monomial XiXjXkXi is counted four times in (1), it is subtracted twice in (2), it is subtracted 
four more times in (3) and it is added four times in (4). To balance these operations, we must 
subtract 9 (4) . 

In all, this shows that 

|Qi U Q 2 \ - ^ Q2 ') - (n 4)A 2 + A 3 - 9< 4 > 
and hence 

*W a (4) = (» + 4 ) - (» + 2 ) | Ql U Q 2 | + ("^ ^ + nA2 - ^3 + e< 4 >. 

A simple computation shows then that 

4(ri, r 2 ) = H Tl (4) + H T2 (4) - 2H Vl ur 2 (4) 

= 2(T)-(T)(lQ 1 | + lQ 2 |) + ( IQ 2 ll ) + ( IQ 2 2 ') 

- 2 ((T) - (T) IQi U O2I + ( IQl ^ 21 ) + nA 2 - A 3 + 9< 4 ') 
= ("+ 2 )|Q 1 AQ 2 | - ( 2 (l Ql ^l) - (1^1) - {^) +2nA 2 - 2A S + 2Q^) 

Now, we know from Lemma A that 

A 2 = IQ1AQ2I - A> 2 , A 3 = IQ1AQ2I - A> 2 - A> 3 . 
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Replacing them in the expression obtained above for d' 5 (ri,r 2 ), and dividing the resulting expression 



by ( n t ) , we finally obtain 



d B (ri,r 2 ) = |QiAQ 2 | 

-^(2(l^ 2 ') -(1^1) - +2(n-l)(|Q 1 AQ 2 |-A> 2 ) + 2(A>3 + eW) 

as we wanted to prove. ■ 

Proof of Proposition 18. To simplify the notations, we shall denote again SF i (Ir 1 ur 2 ) simply by SFi. 
In the expression 

d e (Fi , r 2 ) = — ^ (H ri (5) + Hr 2 (5) - 2H ri ur 2 (5)) , 

I 3 J 

we already know that 

Hr t (5) = ("+ 5 ) " (T)l^l + (" + 1)('°2 41 ). i = 1-2 
Hr lU ir 2 (5) = (™+ 5 ) - (5SFx + 10SF 2 + WSF 3 + 55F 4 + Sf 5 ) 

with 

SFi = 0, SF 2 = |Qi U Q 2 |, ^Fs = (n - 2)|Qi U Q 2 | - A 2 
SF 4 = ("- 2 ) |Qr U Q 2 | - (lOi^Oal) _ („ _ 4 )A 2 + As - e (4) 

Let us compute now SFy. 

(1) For every i-j £ Qi UQ 2 , there are (™j 2 ) square free monomials XiXjXkXix m £ M(Jr 1 ur 2 )- This 
makes (™J )|Qi U Q 2 | such monomials. 

(2) Now, if i-j, k-l € Qi U Q 2 with {i,j} l~l {fc,/} = 0, then for every m ^ fc,Z} the monomial 
XiXjXfcXlXm IS counted twice in (1). Therefore, we must subtract (n - 4)(( IQl ^ q21 ) - A 2 ) to 

(!)• 

(3) If i-j, j-k £ Qi U Q 2 , then for every l,m £ {i, j,fc} the monomial XiXjX k xix m is counted twice 
in (1). Therefore, we must also subtract ( n 2 ' i )A 2 to (1). 

(4) If i-j, j-k, l-m £ Qi U Q 2 with {i, j, fc} n {/, m} = 0, the monomial counted 3 times 
in (1), then it is subtracted twice in (2) and it is subtracted once again in (3). Therefore, to 
retrieve these monomials we must add 

\{{i-j,j-k,l-m} | i-j, j-k, l-m £ Qi U Q 2 , {i,j, k} n {l,m} = 0j"|; 

let us call for the moment X this number. 

(5) If i-j, j-k, k-l £ Qi U Q 2 , for every m {i,j, k,l} the monomial XiXjXkXix m is counted 3 times 
in (1), then it is subtracted once in (2) and it is subtracted twice in (3). Therefore, to retrieve 
these monomials we must also add (n — 4) .A3 monomials. 

Notice now that 

X + A :i = | {{i-j, j-k,l-m} | i-j, j-k, l-m £ Qi UQ 2 }| = (|Qi UQ 2 | - 2)A 2 . 
Therefore, (4) and (5) add jointly 

(\QiUQ 2 \-2)A 2 + (n-5)A a 

monomials. 
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(6) If i-j,j-k,k-l,l-i G Qi U Q 2 , i.e., if {i,j,k,l} form a cyclic orbit of length 4, then for every 
rn {i,j,k,l} the monomial XiXjX k xix m is counted four times in (1), it is subtracted twice in 
(2), it is subtracted four more times in (3) and it is added four times in (5). Therefore, we must 
subtract (n - 4)9 (4) . 

(7) Finally, if i-j,j-k,k-l,l-m € Qi U Q 2 , then the monomial XiXjXkXix m is counted four times in 
(1), it is subtracted three times in (2), it is subtracted three more times in (3) and it is added 
twice in (4) and twice in (5). Therefore, we must subtract A4. 

In all, this shows that 

SF 5 = (V) IQi U Q 2 | - (n - 4)(( |(3 ^« 21 ) - A 2 ) - { n 2 3 )A 2 

+(|Qi U Q 2 \ - 2)^2 + (n- 5)A 3 - (n - 4)9 (4) - A 4 



and hence 



tfr lUr2 (5) = ("+ 5 ) - ("+ 3 ) |Qx U Q 2 \ + (n + + - \Q, U Q 2 | + 2) A 

-nA 3 + (n + 1)9 (4) + A 4 . 



2 



Then 



<(ri, T 2 ) = H ri (5) + Hr 2 (5) - 2H ri ur 2 (5) 

= 2("+ 5 ) - r+ 3 )(|Q!| + |Q 2 |) + (n + + (If)) 

-2((' l + 5 ) - ("+ 3 )|QiUQ 2 | + (n + l)(l Ql ^l) 

+{{ n V) ~ \Qi U Q 2 | + 2)^2 - nA 3 + (n + 1)9 (4) + aA 
= ("+ 3 ) IQ1AQ2I - (n + 1) (2(1^^1) - (If) - (If)) - 2((T) + 2 - |0i U Q2I) A 2 
+2nA 3 - 2(n + 1)9 (4) - 2A 4 . 

Since we already know, by Lemma A, that 

A 2 = IQ1AQ2I - A> 2 , A a = IQ1AQ2I - A> 2 - A> 3 
A 4 = IQ1AQ2I - A> 2 - A> 3 - A> 4 - 46« 

when we replace these values in the expression obtained above for d' 6 (Ti, T 2 ) and we divide the resulting 
expression by ( n J ) , we finally obtain 

deO^r,) = \ Ql AQ 2 \ 1 ((„+i)(2(Wi^l) - (If) - (If)) 



m 1 

+2((") + 3 - |Qi U Q 2 |)(|QiAQ 2 | - A> 2 ) - 2(n - 1)A> 3 + 2A> 4 + 2(n - 3)9 (4) ) 
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